Big Data From Scientific Simulations
نویسندگان
چکیده
Scientific simulations often generate massive amounts of data used for debugging, restarts, and scientific analysis and discovery. Challenges that practitioners face using these types of big data are unique. Of primary importance is speed of writing data during a simulation, but this need for fast I/O is at odds with other priorities, such as data access time for visualization and analysis, efficient storage, and portability across a variety of supercomputer topologies, configurations, file systems, and storage devices. The computational power of high-performance computing systems continues to increase according to Moore’s law, but the same is not true for I/O subsystems, creating a performance gap between computation and I/O. This chapter explores these issues, as well as possible optimization strategies, the use of in situ analytics, and a case study using the PIDX I/O library in a typical simulation.
منابع مشابه
Big Data – The New Science of Complexity
Data-intensive techniques, now widely referred to as ‘big data’, allow for novel ways to address complexity in science. I assess their impact on the scientific method. First, big-data science is distinguished from other scientific uses of information technologies, in particular from computer simulations. Then, I sketch the complex and contextual nature of the laws established by data-intensive ...
متن کاملThe Need for Resilience Research in Workflows of Big Compute and Big Data Scientific Applications
Projections and reports about exascale failure modes conclude that we need to protect numerical simulations and data analytics from an increasing risk of hardware and software failures and silent data corruptions (SDC) [1, 4]. At this scale, hardware and software failures could be as frequent as ten or more per day. According to [9], the semiconductor industry will have increased difficulty pre...
متن کاملHPC and Big Data Convergence for Extreme Heterogeneous Systems
As the data deluge grows ever greater, large-scale data analytics workloads are quickly becoming critical computational tools within the scientific community. Recently, convergence efforts have focused on combining aspects HPC and ”big data” analytics workloads together using a unified supercomputing system. This has the opportunity to bring advanced analytical tools to scientists which enable ...
متن کاملOrchestrating Science DMZs for Big Data Acceleration: Challenges and Approaches
In recent years, most scientific research in both academia and industry has become increasingly data-driven. According to market estimates, spending related to supporting scientific dataintensive research is expected to increase to $5.8 billion by 2018 [1]. Particularly for dataintensive scientific fields such as bioscience, or particle physics within academic environments, data storage/process...
متن کاملToward A Unified HPC and Big Data Runtime
The landscape of high performance computing (HPC) has radically changed over the past decade as the community has well surpassed Petascale performance and aims for Exascale. In this effort, chip fabrication and hardware architects have been directly challenged by the fundamentals of physics of chip manufacturing. The effects of these challenges have extended beyond the underlying hardware requi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014